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Abstract 

The Consolidated Standards of Reporting Trials (CONSORT) was introduced in 1996 to improve the methodological 
quality of published reports of randomised controlled trials. By doing a systematic review of randomised controlled 
trials on reproductive surgery, our group can demonstrate that the overall quality of the published reports of randomised 
studies on reproductive surgical interventions has improved after CONSORT. Nevertheless, some problems still 
remain. By discussing the benefits and pitfalls of randomised trials in reproductive surgery, our opinion paper aims 
to stimulate the reader's further interest in evidence-based practice in reproductive surgery. 
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Introduction 

Traditionally, a wooden spoon was given every year 
to the student with the lowest score at the compre- 
hensive mathematics examination at St Johns Col- 
lege in Cambridge University. It was awarded for the 
last time in 1909. Its possession implied that its 
owner was actually better equipped to be a cook than 
a scholar. 

In 1979 Archie Cochrane awarded a wooden 
spoon to Obstetrics and Gynaecology because the 
uptake of designing randomised controlled trials 
(RCTs) in this discipline was almost non-existent. 
Some time before, he had criticized the medical 
profession by writing that" we have not organised a 
critical summary, by specialty or subspecialty, up- 
dated periodically, of all relevant randomised con- 
trolled trials" (Cochrane, 1979). Initially, Cochrane' s 



challenge was taken up in perinatal medicine. In the 
field of reproductive medicine, the first systematic 
review of the effectiveness of subfertility treatments 
was published in 1993 (Vandekerckhove et ai, 
1993). 

In surgery a "non-evidence-based" approach to 
practice has been traditionally present (Johnson et 
ai, 2008). The latest surgical technique is often 
embraced by the clinical community either when it 
seems rational or revolutionary or whenever it 
demonstrates the technical skill of the surgeon. 

In this opinion paper we will present data and 
some conclusions on the current methodological 
quality of published reports of randomised studies in 
reproductive surgery. By discussing the benefits and 
pitfalls of RCTs in reproductive surgery, we aim to 
stimulate the reader's interest in evidence-based 
practice in reproductive surgery. 
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Evidence-based medicine 

"Evidence-based medicine is the conscientious, 
explicit, and judicious use of current best evidence 
in making decisions about the care of individual 
patients" (Sackett et al, 1996). The practice of 
evidence-based medicine (EBM) stands for the inte- 
gration of individual clinical expertise with the best 
available external clinical evidence from systematic 
research. The essentials of EBM include five con- 
secutive steps: first of all, to ask the right questions; 
secondly, to find the best level of evidence available; 
thirdly, to appraise critically the evidence for risk of 
bias, clinical relevance and applicability; fourthly, to 
implement the results of the appraisal in every day 
clinical practice and fifthly to evaluate the changes 
in practice (Farquhar and Vail, 2006). The highest 
level of evidence is derived from well written criti- 
cally appraised systematic reviews of RCTs. The 
randomised controlled trial is generally accepted as 
being the least biased measure of the effectiveness 
of interventions. Although observational studies are 
considered vastly superior to RCTs in detecting 
adverse events e.g. surgical complications, they are 
often misleading when they are employed in search- 
ing for moderate treatment benefits. Systematic re- 
views comparing observational studies with 
randomised trials of the same interventions for the 
same conditions in the same study populations 
concluded that the former were clearly unreliable 
and consistenly overestimated the treatment effect 
(Britton et al, 1998; Kunz et al, 2004). 

RCTs in surgery: the benefits 

Gynaecology has evolved to becoming a specialty in 
which the interventions are increasingly exposed to 
the gold standard of RCTs (Johnson et al. , 2003). An 
overview of 23 systematic reviews including 94 gy- 
naecological surgical trials in the Cochrane Database 
of Systematic Reviews (CDSR) (Selman et al., 2008) 
has ended up to the final conclusion that the quality 
of the RCTs has significantly improved since the 
Consolidated Standards of Reporting Trials 
(CONSORT) was introduced in 1996 (Begg et al., 
1996). Using meta-regression analysis the authors 
have demonstrated that the proportion of studies re- 
porting allocation concealment has significantly in- 
creased after the introduction of the CONSORT 
statement (60% versus 26%, p = 0.002). In parallel, 
a reduction in the magnitude of the effect estimate 
was observed over time (log of the ratio of odds 
ratios per year 0.96, 95% CI 0.93-0.99, p = 0.05) 
together with a trend towards higher precision of the 
estimation of the treatment effect (inverse of variance 
of the log odds ratio 0.12, 95% CI 0.02-0.23, p = 



0.03) (Selman et al., 2008). In a second overview of 
30 reviews in the CDSR, the same authors' group 
found that only 7 out of 30 reviews reported evidence 
of a significant effect, 1 1 out of 30 reviews con- 
cluded that there was some evidence of significant 
effects for primary outcomes along with some evi- 
dence gaps while in the remaining 12, the authors 
found insufficient evidence of effectiveness (Johnson 
et al., 2008). 

In conclusion, apart from providing up to date 
unbiased evidence on health care interventions, 
systematic reviews of RCTs can identify 'gaps of 
knowledge' where there is insufficient or no evi- 
dence at all. Several knowledge gaps in the evidence 
for fertility treatment have already been identified in 
a review of RCTs from the Cochrane Menstrual 
Disorders and Subfertility Group (MDSG) database 
(Johnson et al., 2003). 

RCTs in surgery: the problems and pitfalls 

There are two major categories of methodological 
challenges that need to be at least identified if not 
solved during the design phase of RCTs on surgical 
interventions (McCulloch et al., 2002; McLeod, 
1999). 

The first category concerns issues on the design 
and conduct of surgical trials. The surgical learning 
curve raises an interesting dilemma for the timing of 
surgical trials: it is well known that the individual 
surgeons' complication rates fall significantly as the 
procedure is carried out on more and more patients. 
While drugs in trials work the same regardless of the 
competence of the prescribing physician, there are 
surgeon-to-surgeon differences in the preferences for 
and the expertise in performing different surgical 
procedures (Devereaux et al., 2005). In a recent 
Cochrane review on the effectiveness of excisional 
versus ablative surgery for ovarian endometriomata 
an effect favoring the excision of the cyst wall com- 
pared with its drainage and ablation was demon- 
strated the odds for a spontaneous pregnancy at 
12 months after excision of the endometriotic cysts 
was higher compared with the control group which 
was treated by drainage and ablation (OR 5.2, 95% 
CI 1.9-14) (Hart et al., 2011). One of the included 
trials provided evidence for a treatment effect in 
favor of the excision technique for the spontaneous 
pregnancy rate at 12 months (OR 4.8, 95% CI 1.6- 
14, 62 patients) (Alborzi et al, 2004) while another 
smaller trial demonstrated a higher point estimate 
favoring excision over ablation but failed to reach 
statistical significance (OR 8.0, 95% CI 0.69-93, 26 
patients) (Beretta et al, 1998). In the former trial, 
the intervention was performed by the same surgeon 
in two university centres, whereas in the latter no 
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information on the number of surgeons involved was 
available. Additionally, in both of them no informa- 
tion on the expertise of the performing surgeons was 
given. The need for head-to-head comparisons be- 
tween different surgical techniques inevitably neces- 
sitates that the same surgeon prefers both techniques 
and is an expert in performing both of them. This is 
difficult in practice and impossible to achieve 
through studies. Therefore a strong case can be made 
for "expertise-based" trials in which consenting pa- 
tients are allocated to different expert surgeons, who 
carry out the procedure they prefer and are expert in 
performing. While improving the internal validity of 
the trial, this potentially diminishes the external va- 
lidity of the trial as well, meaning that the results of 
the RCT cannot be generalised as such without cau- 
tion. 

In the same context, in the same trials mentioned 
above, the application of both techniques raises fur- 
ther considerations: how sure can we be that both 
surgical teams were using comparable techniques? 
Did they selectively coagulate visible endometriotic 
lesions or was the whole cyst wall evaporated? No 
such clarifications were presented in the published 
reports. In addition, in one trial (Beretta et al, 1998) 
hydroflotation was used in contrast to the second 
trial (Alborzi et al, 2004). The previous remarks il- 
lustrate the great difficulty to standardize a surgical 
intervention since each individual surgeon develops 
his own modification of a standard technique e.g. for 
dissection, hemostatis and/or management of com- 
plications. There have been some attempts, though, 
to comprehensively standardise the technical steps 
of surgical interventions (Kapiteijn et al, 1999). 

Another point which needs adressing is the diffi- 
culty of blinding or masking of a surgical procedure 
combined with the legal obligation of the treating 
physician to obtain informed consent. This can be a 
major problem if "soft" outcome measures, e.g.pain 
or quality of life are being assessed through selfre- 
porting by unblinded patients or determined by un- 
blinded assessors. The emotional consequences of 
knowing one's treatment may significantly affect the 
reporting of outcomes. "Sham" surgical procedures 
have been conceived in the past to try to overcome 
the issue. Ethical problems may potentially arise 
(Moseley et al, 2002) and therefore "hard" outcome 
measures, e.g. live birth rate are mostly preferred. 
The latter are relatively independent to the knowl- 
edge of a patient's treatment, but there still is the 
need for the outcome assessors to be blinded to the 
allocated treatments. 

A second category of problems concerns the in- 
terpretation of the results of the trials. The mixture 
of data from trials conducted by less experienced 
surgeons together with others done by more expert 



ones may negatively affect the magnitude of the ef- 
fect estimate, since differences in treatment out- 
comes are expected. In surgery, it is logical that a 
surgical intervention has a more favorable outcome 
when the provider is more experienced. Finally, a 
common problem in RCTs concerns the statistical 
power of surgical trials: a large survey of 90 "nega- 
tive" surgical trials found that only 24% had suffi- 
cient power to detect relative risk reductions of 50% 
and only 29% reported a formal sample size calcu- 
lation (Dimick JB et al, 2001). Power calculation is 
currently considered as an absolute must in the 
proper conduct of an RCT. It constitutes one of the 
main endpoints which a reviewer has to judge for a 
clinical trial and gives the adequate power to the re- 
sults and therefore the interpretation of the trial's 
data. 

When is it ethical to design an RCT in surgery? 

It is essential to define the circumstances under 
which an RCT can be conducted to determine 
whether a surgical procedure is more effective com- 
pared to other surgical or non-surgical treatments. 
We usually undertake trials because we hypothesise 
that a new surgical procedure can be better than the 
current standard practice in terms of efficacy, safety 
or cost but we are uncertain whether this statement 
is true or false. The limits of uncertainty include the 
possibility that the new technique may not be better 
or even worse than the current standard practice. The 
true uncertainty on the part of the expert professional 
community about the benefit to harms balance of 
two or more treatments for a well-defined study pop- 
ulation has been described as the "clinical equipoise" 
(Freedman, 1987). When clinicians, methodologists 
and ethics committees or institutional review boards 
are uncertain whether an intervention is beneficial, 
an RCT is judged to be appropriate. 

In addition we need to consider the uncertainty in 
the patient-clinician relationship, through an active 
patient participation in the inclusion/ exclusion 
process. If the patient is certain that a specified treat- 
ment is better or safer, then the patient should not be 
included in the trial. Similarly, if the physician 
judges that a particular patient is clearly better off 
with a treatment, ethically, he is obliged to inform 
the patient and to assist in seeking the most appro- 
priate treatment, excluding the patient from the trial. 
If both physician and patient are uncertain which 
treatment to choose, the patient should be offered in- 
clusion in the trial. 

The above principle of equipoise should always 
be considered as the gold standard in deciding 
whether or not to design an RCT. Some consider the 
evidence provided by non-randomised studies as an 
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number of RCTs per decade 



■ 1970-1979 


■ 1990-1999 


□ 1980-1989 


■ 2000-2010 



Fig. 1. — RCTs on the effectiveness of reproductive surgery 



ethical basis to discard the need for further research. 
Their certainty based on the results of studies with a 
high risk of bias should nevertheless be put aside in 
deference to the reasoned uncertainty existing within 
the larger community of experts (Haynes et al., 
2006). 

RCTs in reproductive surgery: the present state 

Our group has published a systematic review on the 
effectiveness of reproductive surgery for treating 
female infertility (Bosteels et al., 2010). We con- 
ducted a search in the Cochrane Library, MEDLINE 
and EMBASE for RCTs on reproductive surgery 
in subfertile women. Our findings demonstrated a 
steady increase from 1970-2010 in the number of 
RCTs on the effectiveness of reproductive surgery 
per decade (Figure 1). 

Nearly 75% of the included 63 RCTs had an ade- 
quate random sequence generation and nearly 50% 
had adequate allocation concealment (Figure 2). The 
percentage of RCTs on reproductive surgical inter- 
ventions with adequate allocation concealment (26 
out of 63 studies or 41%) was similar (p = 0.67) to 
the findings of the review of gynaecological surgical 
trials available in the Cochrane Library (42 out of 
94 studies or 45%) (Selman et al, 2008). 

The number of trials with adequate random se- 
quence generation has nearly doubled from the pre- 
compared to the post-CONSORT era (RR 1.7; 95% 
CI 0.98-3.1) (Figure 3): the difference was margin- 
ally insignificant (p = 0.06). Although the number of 



RCTs in the field of reproductive surgery with 
adequate allocation concealment has nearly doubled 
from the pre- (4 out of 16 studies or 25%) compared 
to the post-CONSORT era (22 out of 47 studies or 
47%), the current sample size in our review is too 
small to draw definitive conclusions (RR 1.9, 95% 
CI 0.76-4.6) (Figure 3). Despite the non- significant 
p-value (p = 0.17) our data are nevertheless consis- 
tent with the findings of the review of gynaecological 
surgical RCTs in the Cochrane Library which did 
demonstrate both an important and statistically sig- 
nificant increase (p - 0.002) (Selman et al, 2008). 
The absence of evidence of a better methodological 
quality (RR 1.0, 95% CI 0.23-4.6) concerning blind- 
ing pre-versus post-CONSORT illustrates the great 
difficulty of adequate blinding in surgical trials 
(Figure 3). The methodological quality of the trials 
on reproductive surgery as determined by random 
sequence generation, allocation concealment and 
blinding has improved after the CONSORT state- 
ment (RR 1.7, 95% CI 1.1-2.7); the p-value was 
compatible with a statistically significant difference 
(p = 0.03) (Figure 3). 

Live birth rate was reported as the primary out- 
come measure in 16 out of 63 studies or 25% of the 
included RCTs. 

In 7 out of 15 topics there was evidence of a 
significant effect for primary outcomes; in 5 out of 
15 topics there was some evidence of effect for 
primary outcomes along with some evidence gaps; 
in 3 out of 15 topics there was insufficient or no 
evidence. A summary of the grading of the evidence 
for different topics in reproductive surgery is pre- 
sented in Table 1 . 

Discussion and future perspectives 

The limited and poor quality evidence provided by 
63 RCTs indicated a positive role for some surgical 
reproductive interventions. Overall the methodolog- 
ical quality of the RCTs published after the 
CONSORT statement in 1996 has improved but this 
conclusion should be made with caution given the 
limited numbers of the included trials in our system- 
atic review. In addition it is evident that not every 
methodological problem has been solved. Since re- 
productive medicine was one of the first domains 
where the need for evidence-based practice was 
stressed (Vandekerckhove et al., 1993), it seems 
logical that research in reproductive surgery should 
also be further exposed against the gold standard of 
RCTs. We agree with others that evidence-based 
reproductive surgery "is no passing fad" (Johnson et 
al, 2008). 

In many publications on the methodological 
aspects of studies, the concealment of allocation 
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Fig. 2. — Methodological quality: risk of bias across studies 



to the treatment and the control group has been 
consistently shown to be the single most important 
factor in assessing the quality of RCTs (Farquhar and 
Vail, 2006). Nevertheless several large studies as- 
sessing the use of allocation concealment in different 
topic areas and subfertility trials have reported this 
item infrequently (Jiini et al, 1999; Moher et al, 
1995; Schulz et al, 1994; Kjaergard et al, 2001). 



This should be a major concern for trialists designing 
future RCTs in surgery. In contrast, while the ab- 
sence of blinding is almost inherently associated 
with surgical trials, blinding has not been consis- 
tently shown to affect the estimation of the treatment 
effect magnitude (Jiini et al, 1999; Moher et al, 
1995; Schulz et al, 1994; Kjaergard et al, 2001). 
The quality of the generation of the randomisation 



Post CONSORT Pre CONSORT Risk Ratio 

Study or Subgroup Events Total Events Total Weight M-H, Fixed, 95% CI 



1.1.1 Random sequence generation 

Bosteels2010 36 47 

Subtotal (95% CI) 47 

Total events 36 

Heterogeneity: Not applicable 

Test for overall effect: Z = 1 .90 (P = 0.06) 

1 .1 .2 Allocation concealment 

Bosteels2010 22 47 

Subtotal (95% CI) 47 

Total events 22 

Heterogeneity: Not applicable 

Test for overall effect: Z = 1 .36 (P = 0.17) 



Risk Ratio 
M-H, Fixed, 95% CI 



1.1.3 Blinding 

Bosteels 2010 
Subtotal (95% CI) 

Total events 
Heterogeneity: Not applicable 
Test for overall effect: Z = 0.03 (P = 0.98) 
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Test for subgroup differences: Chi 2 = 0.50, df = 2 (P = 0.78), I 2 = 0% 
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Fig. 3. — RCTs with adequate random sequence generation, allocation concealment and blinding before vs. after CONSORT (1996) 
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Table 1. — Grading of evidence of the randomised studies 


in reproductive surgery. 


Topic under review 


RCTs 


number of participants conclusions for primary outcomes (evidence category) 


Laparoscopic treatment for subfertility 
associated with rAFS l/II endometriosis 


2 


437 


Laparoscopic excision/ ablation and adhesiolysis improves the chance 
for live birth and ongoing pregnancy (E) 


Treatment of endometriomata by excision or 
ablation. 


2 


88 


The excision of endometriotic cysts significantly improves the chance for 
spontaneous conception at 1 2 months (EG) 


Treatment of endometriomata prior to 1VF 


1 


99 


There is no evidence of an effect in favor of removing endometriomata prior to IVF (G). 


Laparoscopic drilling for induction of 
ovulation in PCOS. 


6 

5 
5 


439 

166 
181 


There is no evidence of a treatment effect of LOD (6-12 months follow-up) versus 
gonadotropin injections (3-6 cycles) for the ongoing pregnancy rates (EG). 
There are significantly fewer multiple pregnancies with LOD (E) 
There is no evidence of an effect of bilateral compared to unilateral LOD (EG). 


Surgical treatment for tubal disease in 
women with hydrosalpinx due to 
undergo IVF 


4 

2 


455 
209 


Laparoscopic salpingectomy for hydrosalpinges prior to IVF significantly improves 
the chances for pregnancy (all definitions) (E). 
Tubal occlusion is at least as effective as an alternative (EG) 


P rpvpntinn siHhpcinnc qflpr nrp\nni iq 
riCVCIKlUM \Ji aUUCollrlld O-ILvI piCVlUUo 

reproductive surgery 


j 


74 

36 


There is no evidence of a treatment effect for second-look laparoscopy with adhesio- 
lysis in improving pregnancy rates after failed tubal microsurgery(E). 
There is some benefit for the use of hyaluronic acid gel after laparoscopic myomec- 
tomy (G). 


Surgical treatment of fibroids for sub- 
fertility 


2 

1 


309 
87 


Hysteroscopic myomectomy doubles the pregnancy rate compared to expectant 
management in subfertile women with submucosal fibroids (EG). 
The removal of intramural or subserosal fibroids tends to increase the pregnancy rate, but 
the effect is not statistically significant (G). 


Laparoscopy prior to 1UI 


1 


154 


There is no evidence of a treatment effect of laparoscopy prior to IU1 (E). 


Hysteroscopic removal of polyps 


1 


215 


Hysteroscopic removal of polyps visible on ultrasound increases the pregnancy 
rates in women undergoing IUI (E) 


Hysteroscopy in women with IVF failure 


2 


941 


Hysteroscopy prior to IVF doubles the clinical pregnancy rates in patients with 2 
failed IVF attempts (E). 



sequence has similarly with the item of blinding not 
been shown to be of major importance in causing 
substantial bias (Jiini et al., 2001). 

Considering the outcome measures, the majority 
of trials in subfertility and reproductive surgery do 
not report live birth outcomes as their primary out- 
come. This problem has already been highlighted by 
others (Vail and Gardner, 2003). It could be argued 
that all future trials on the effectiveness of reproduc- 
tive surgical interventions should report live birth 
rate as the primary outcome measure since it is the 
single most important outcome of interest for 
couples undergoing fertility treatment. Ideally, the 
cumulative live birth rate, using life table analysis, 
should be described, as it accounts for the time to 
pregnancy and allows to substract periods when the 
patient was not actively seeking to conceive. Time- 
to-event data are however troublesome for use in 
statistical pooling in meta-analyses. Moreover, the 
other outcome measures of interest in reproductive 
trials e.g. pregnancy and miscarriage rates should not 
be considered inferior since some conditions 
amenable to surgery may have an indirect impact on 
fertility, e.g. septate uterus which increases the 
probability of miscarriage. 

Finally, the correct use of evidence statements 
should be encouraged. A common error observed in 
many studies is the confusion between "significant" 
and "important" or "clinically relevant". A result is 



statistically significant if the difference observed 
between the study and the control samples is suffi- 
ciently convincing to signify a real difference in the 
population of which the sample is representative. A 
result is important or clinically relevant if the 
magnitude of the effect estimate is large enough to 
constitute a real difference between a control and 
study intervention for a given outcome. Ideally, 
authors and trialists should predefine minimally 
important clinical differences, based on estimates or 
trade-offs by physicians and/ or patients of what re- 
ally constitutes an important improvement of the out- 
come under study. If the sample size is large enough, 
a clinically unimportant or even trivial difference 
may signify a population difference, while in 
contrast clinically relevant differences may not be 
statistically significant if the sample size is too small. 
A second common error is the misinterpretation of 
a statistically non-significant finding. "Negative 
trials" do not exist! The correct expression of a 
conclusion is the absence of evidence of a particular 
effect and not the evidence of its absence (Altma and 
Bland, 1995; Alderson and Chalmers, 2003). 

The methodological quality of surgical trials can 
be improved eighter through the training of surgeons 
in clinical epidemiology and evidence-based medi- 
cine or employing epidemiologists in surgical units 
where clinical research is being carried out (Urschel 
et al, 2001; Madhok et al, 2002). The evidence 
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from our recent systematic review is consistent with 
this viewpoint. 

In conclusion, true progress in the field of repro- 
ductive surgery needs a balanced combination of sur- 
gical skills, a drive for innovation together with the 
exposure of clinical research to the undoubtful 
validity of evidence-based medicine. 
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